151 research outputs found

    Gibberish, assistant, or master? Using tweets linking to news for extractive single-document summarization

    Get PDF
    Single-document summarization is a challenging task. In this paper, we explore effective ways using the tweets link-ing to news for generating extractive summary of each doc-ument. We reveal the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences. Base on such finding, we resort to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph. The advantage is that we can use the linking tweets to opportunistically “supervise ” the summa-rization with no need of reference summaries. Furthermore, we analyze the influence of the volume and latency of tweets on the quality of output summaries since tweets come af-ter news release. Compared to truly supervised summarizer unaware of tweets, our method achieves significantly better results with reasonably small tradeoff on latency; compared to the same using tweets as auxiliary features, our method is comparable while needing less tweets and much shorter time to achieve significant outperformance

    Utilizing microblogs for improving automatic news high-lights extraction

    Get PDF

    DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dense Symptom Representations

    Full text link
    Diagnosis-oriented dialogue system queries the patient's health condition and makes predictions about possible diseases through continuous interaction with the patient. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods cannot achieve sufficiently good prediction accuracy, still far from its upper limit. To address the problem, we propose a decoupled automatic diagnostic framework DxFormer, which divides the diagnosis process into two steps: symptom inquiry and disease diagnosis, where the transition from symptom inquiry to disease diagnosis is explicitly determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model respectively. We use the inverted version of Transformer, i.e., the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross entropy loss. Extensive experiments on three public real-world datasets prove that our proposed model can effectively learn doctors' clinical experience and achieve the state-of-the-art results in terms of symptom recall and diagnostic accuracy.Comment: 7 pages, 4 figures, 3 table

    Using tweets to help sentence compression for news highlights generation

    Get PDF
    We explore using relevant tweets of a given news article to help sentence com-pression for generating compressive news highlights. We extend an unsupervised dependency-tree based sentence compres-sion approach by incorporating tweet in-formation to weight the tree edge in terms of informativeness and syntactic impor-tance. The experimental results on a pub-lic corpus that contains both news arti-cles and relevant tweets show that our pro-posed tweets guided sentence compres-sion method can improve the summariza-tion performance significantly compared to the baseline generic sentence compres-sion method.

    Using content-level structures for summarizing microblog repost trees

    Get PDF
    A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on mi-croblogging sites, we propose a novel re-post tree summarization framework by ef-fectively differentiating two kinds of mes-sages on repost trees called leaders and followers, which are derived from content-level structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summariza-tion model to rank and select salient mes-sages based on the result of leader detec-tion. To reduce the error propagation cas-caded from leader detection, we improve the framework by enhancing the random walk with adjustment steps for sampling from leader probabilities given all the re-posting messages. For evaluation, we construct two annotated corpora, one for leader detection, and the other for repost tree summarization. Experimental results confirm the effectiveness of our method.
    corecore